The efficiency of reinforcement learning algorithms depends critically on afew meta-parameters that modulates the learning updates and the trade-offbetween exploration and exploitation. The adaptation of the meta-parameters isan open question in reinforcement learning, which arguably has become more ofan issue recently with the success of deep reinforcement learning inhigh-dimensional state spaces. The long learning times in domains such as Atari2600 video games makes it not feasible to perform comprehensive searches ofappropriate meta-parameter values. We propose the Online Meta-learning byParallel Algorithm Competition (OMPAC) method. In the OMPAC method, severalinstances of a reinforcement learning algorithm are run in parallel with smalldifferences in the initial values of the meta-parameters. After a fixed numberof episodes, the instances are selected based on their performance in the taskat hand. Before continuing the learning, Gaussian noise is added to themeta-parameters with a predefined probability. We validate the OMPAC method byimproving the state-of-the-art results in stochastic SZ-Tetris and in standardTetris with a smaller, 10$\times$10, board, by 31% and 84%, respectively, andby improving the results for deep Sarsa($\lambda$) agents in three Atari 2600games by 62% or more. The experiments also show the ability of the OMPAC methodto adapt the meta-parameters according to the learning progress in differenttasks.
展开▼